This is my PM566 HW5 website.
#remove data that do not have number of case record
newcase_data=new[-(which(is.na(new$case_in_country))),c(2,3,5)]
dim(newcase_data)
## [1] 887 3
#compute new cases for each country
a=as.data.frame(table(newcase_data$`reporting date`))
a %>%
plot_ly(x = ~Var1, y = ~Freq, type = "scatter", mode = "lines") %>%
layout(xaxis = list(title="Date"), yaxis = list(title="Number of cases"), title = "Number of new cases outside of China")
From the plot we can see that, at the beginning 2 months of the COVID-29 pandemic, number of new cases outside of China is increasing as time increase. It can tell us that the virus has already been spread out around the world.
new %>% plot_ly(y = ~exposure_start, type = "box", name = "Exposure Start Date") %>%
add_trace(y = ~symptom_onset, type = "box", name = "Symptom_onset Date")
We can see there is around 14 days between the median of symptom onset date and exposure start date, which is the same as the incubation period of the COVID-19 virus.
b=as.data.frame(table(newcase_data$country))
b %>%
group_by(Var1) %>%
plot_ly(labels = ~ Var1, values = ~Freq, textposition = "inside") %>%
add_pie(hole = 0.6) %>%
layout(title = "Donus Chatrt of total numbe cases outside of China")
We can see that, countries who have large number of new cases are those located around China. This can well explain that China might be the original place of the COVID-19 virus.